Cache the mapping of sequence to log block index in transaction log iterator #12538

HypenZou · 2024-04-15T11:23:11Z

Context/Summary:

The official documentation recommends using the GetUpdatesSince method for master-slave synchronization. Every time this method called, it will search for the start sequence from the beginning of the target WAL log. It may costs too much to seek to start sequence when wal file size is large. If high requirements are placed on master-slave delay, SeekToStartSequence() may be called very frequently and cause a large amount of read IO (although in most cases read page cache).
There are some issues discussing about it:
#12516
#889

In most scenarios, the start sequences of the GetUpdatesSince method are continuous. Caching the previously seek position can avoid most repeated readings of the wal file.

This commit will cache the mapping of sequence to WAL file block index. It does this both after seeking to the start sequence of the iterator and after iterating to the end of the iterator. When seeking to start sequence, the log reader will lookup the first batch sequence that is not larger than the target sequence and then skip wal file read pointer to the start of the block, instead of seeking from the start.

The optimization effect on the getupdatesSince interface is quite obvious

Test:
covered by ./db_log_iter_test

HypenZou · 2024-05-07T20:03:15Z

@ajkr @jowlyzhang hi, could you please review my code when you have time? Any comments or suggestions are welcome ~

reusing the iterator may be a workaround, but it may be weird and undefined to use a iterator after it is invalid. Additionally, the comments for the Next() method state that it requires Valid() to be true.

jsteemann · 2024-06-12T07:46:20Z

We have also observed that WAL tailing using consecutive calls to GetUpdatesSince() with increasing sequence number offsets result in "random" latency. The latency seems to depend on how many WAL records need to be skipped for the call, which is exactly what would be addressed by this PR. So this would be a very welcome improvement for us, and we would be very happy if this could be included in one of the next releases.

cbi42 · 2024-06-14T23:37:15Z

db/log_reader.cc

-          ReportCorruption(fragment.size(),
-                           "missing start of fragmented record(2)");
+          if (!skipped_) {
+            ReportCorruption(fragment.size(),


Can also clear skipped_ here.

done, thx for review

cbi42 · 2024-06-15T00:20:50Z

db/db_log_iter_test.cc

@@ -330,6 +330,26 @@ TEST_F(DBTestXactLogIterator, TransactionLogIteratorBlobs) {
      "Delete(0, key2)",
      handler.seen);
 }
+
+TEST_F(DBTestXactLogIterator, TransactionIteratorCache) {


This test would pass before this change too. Maybe check that cache is effective by check some statistics on IO.

update it, thx for review

cbi42 · 2024-06-15T00:24:18Z

db/log_reader.cc

@@ -130,13 +132,16 @@ bool Reader::ReadRecord(Slice* record, std::string* scratch,
        prospective_record_offset = physical_record_offset;
        scratch->assign(fragment.data(), fragment.size());
        in_fragmented_record = true;
+        skipped_ = false;


Need to also clear after kUserDefinedTimestampSizeType.

done, thx for review

cbi42 · 2024-06-15T00:26:36Z

db/log_reader.h

@@ -170,6 +185,10 @@ class Reader {
  // is only for WAL logs.
  UnorderedMap<uint32_t, size_t> recorded_cf_to_ts_sz_;

+  // if log reader is skipped, may need to drop bytes
+  // until seek to the first of a record
+  bool skipped_;


Consider changing the name to something like skip_first_fragmented_record_ since this field will be set to false after skipping a record.

done ✅ , thx for reveiw

cbi42 · 2024-06-15T00:27:37Z

include/rocksdb/transaction_log.h

+    // if true, the mapping of db sequence to WAL file block index will be
+    // cached. This prevents repeated reading from the beginning of the
+    // target wal log when GetUpdatesSince() is called.
+    // Default: true


We should not enable this by default since this won't work with WAL compression, where the first record sets the compression type.

done , thx for reveiw~
and now will return not support err when wal compression and with_cache are both set

cbi42 · 2024-06-15T00:29:35Z

db/transaction_log_impl.cc

+  if (read_options_.with_cache_) {
+    // cache the mapping of sequence to log block index when seeking to the
+    // start or end sequence
+    if ((current_batch_seq_ <= starting_sequence_number_ &&


What's the use case to cache for the start sequence?

When there are multiple slaves catching up, the start sequence may not necessarily be incremental; Also, there may be scenarios that do not iterate to the end sequence but calling GetUpdateSince frequently. Cache at the beginning and end sounds reasonable for me

cbi42 · 2024-06-15T00:31:14Z

db/transaction_log_impl.h

+    uint64_t block_index;
+    SeqWithFileBlockIdx(uint64_t log_num, uint64_t seq_num, uint64_t block_idx)
+        : log_number(log_num), seq_number(seq_num), block_index(block_idx){};
+    bool operator<(const SeqWithFileBlockIdx& other) const {


Would block_index be different with the same log_number and seq_number?

There shouldn't be such a situation, ~~but retaining the judgment for block_idx seems no problem~~
update: remove the judgement and add assertion to ensure it's same.

cbi42 · 2024-06-15T00:32:09Z

db/transaction_log_impl.h

+    std::lock_guard<std::mutex> lk{mutex_};
+    if (cache_.size() > size_) {
+      auto iter = cache_.begin();
+      std::advance(iter, rand_.Next() % cache_.size());


What about always erasing the oldest one? I assume GetUpdatesSince() is called with likely increasing sequence numbers.

has changed to replace the oldest one. thx for the advice

…terator

facebook-github-bot added the CLA Signed label Apr 15, 2024

HypenZou changed the title ~~Cache sequence to log block index in transaction log iterator~~ Cache the mapping of sequence to log block index in transaction log iterator Apr 15, 2024

HypenZou force-pushed the get-update-since branch 2 times, most recently from f26e11d to de39db5 Compare April 16, 2024 05:40

HypenZou force-pushed the get-update-since branch 2 times, most recently from ae84b5c to d1ad917 Compare May 7, 2024 16:39

jowlyzhang self-requested a review May 13, 2024 22:48

HypenZou force-pushed the get-update-since branch from d1ad917 to 06ab2da Compare May 22, 2024 03:03

cbi42 requested changes Jun 15, 2024

View reviewed changes

HypenZou force-pushed the get-update-since branch 6 times, most recently from 0f276f3 to 89231e3 Compare June 18, 2024 15:32

HypenZou requested a review from cbi42 June 18, 2024 16:21

HypenZou force-pushed the get-update-since branch 2 times, most recently from a382537 to 0da6a66 Compare June 20, 2024 19:37

Cache the mapping of sequence to log block index in transaction log i…

ec7a7e4

…terator

HypenZou force-pushed the get-update-since branch from 0da6a66 to ec7a7e4 Compare June 20, 2024 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the mapping of sequence to log block index in transaction log iterator #12538

Cache the mapping of sequence to log block index in transaction log iterator #12538

HypenZou commented Apr 15, 2024 •

edited

Loading

HypenZou commented May 7, 2024 •

edited

Loading

jsteemann commented Jun 12, 2024

cbi42 Jun 14, 2024

HypenZou Jun 18, 2024

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024 •

edited

Loading

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024 •

edited

Loading

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024 •

edited

Loading

cbi42 Jun 15, 2024

HypenZou Jun 18, 2024

Cache the mapping of sequence to log block index in transaction log iterator #12538

Are you sure you want to change the base?

Cache the mapping of sequence to log block index in transaction log iterator #12538

Conversation

HypenZou commented Apr 15, 2024 • edited Loading

HypenZou commented May 7, 2024 • edited Loading

jsteemann commented Jun 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HypenZou Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HypenZou Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HypenZou Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HypenZou commented Apr 15, 2024 •

edited

Loading

HypenZou commented May 7, 2024 •

edited

Loading

HypenZou Jun 18, 2024 •

edited

Loading

HypenZou Jun 18, 2024 •

edited

Loading

HypenZou Jun 18, 2024 •

edited

Loading